Core to resource mapping and resource to core mapping

ABSTRACT

Core to resource and resource to core mapping is disclosed. In an embodiment, a method includes obtaining an input pattern including a plurality of resource identifiers corresponding to resources. The method further includes applying the input pattern to a guaranteed regular and uniform distribution process to obtain a distribution pattern that indicates a distribution of resources across cores or a distribution of the cores across the resources. The method further includes distributing the resources across the cores or distributing the cores across the resources according to the distribution pattern.

CROSS REFERENCE TO OTHER APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 62/664,781 entitled CORE TO RESOURCE MAPPING AND RESOURCE TO COREMAPPING filed Apr. 30, 2018, which is incorporated herein by referencefor all purposes.

BACKGROUND OF THE INVENTION

Network traffic is conventionally managed by a network device called anapplication delivery controller (ADC). The ADC manages access of contentby handling client requests for content. The ADC load balances incomingclient requests to servers. The host machine (ADC) performing the loadbalancing may have several cores. Conventional load balancers typicallyload balance any client request on any core to any server, typically byusing a counter for each server to track when that server has been usedby a core. The tracking of counters is memory and computationallyintensive. In a distributed system, for example, as the number of coresin a host machine increases, the cost and complexity of trackingresource usage increases. Today, the number of cores on a host istypically anywhere from one to 128, and this number will grow astechnology progresses. Thus, there is a need to efficiently allocateresources (such as servers) to cores (such as processing units) and viceversa.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 is a block diagram illustrating an application delivery networkin which cores can be mapped to resources and resources can be mapped tocores.

FIG. 2 shows an example mapping of resources to cores.

FIG. 3A shows an example mapping of resources to cores according to anembodiment of the present disclosure.

FIG. 3B shows another example mapping of resources to cores according toan embodiment of the present disclosure.

FIG. 4 is a flow chart illustrating an embodiment of a process to mapresources to cores or cores to resources.

FIG. 5 is a flow chart illustrating an embodiment of a process to mapcores to resources.

FIG. 6 shows an example mapping of cores to resources resulting from theprocess shown in FIG. 5.

FIG. 7 is a flow chart illustrating an embodiment of a process to mapresources to cores.

FIG. 8 is a flow chart illustrating an embodiment of a process to mapresources to cores.

FIG. 9 shows an example mapping of resources to cores resulting from theprocess shown in FIG. 8.

FIG. 10 is a flow chart illustrating an embodiment of a process to mapresources to cores in which cores have sub-cores.

FIG. 11 shows an example mapping of resources to sub-cores resultingfrom the process shown in FIG. 10.

FIG. 12 is a functional diagram illustrating a programmed computersystem for mapping resources to cores of cores to resources inaccordance with some embodiments.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess; an apparatus; a system; a composition of matter; a computerprogram product embodied on a computer readable storage medium; and/or aprocessor, such as a processor configured to execute instructions storedon and/or provided by a memory coupled to the processor. In thisspecification, these implementations, or any other form that theinvention may take, may be referred to as techniques. In general, theorder of the steps of disclosed processes may be altered within thescope of the invention. Unless stated otherwise, a component such as aprocessor or a memory described as being configured to perform a taskmay be implemented as a general component that is temporarily configuredto perform the task at a given time or a specific component that ismanufactured to perform the task. As used herein, the term ‘processor’refers to one or more devices, circuits, and/or processing coresconfigured to process data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

FIG. 1 is a block diagram illustrating an application delivery networkin which cores can be mapped to resources and resources can be mapped tocores. The mapping can be performed according to the techniquesdescribed below. In this example, the application delivery networkincludes an application delivery controller (ADC) 110, four clients, twoservers, and three content files. The clients provide an interface forusers to access the content files. The clients can be implemented by anyprocessing device, for example, a smartphone, tablet, desktop, computer,etc. The user may access the content via a web browser or otherapplication executing on the clients. ADC 110 receives requests from theclients, and directs requests to the appropriate server(s). The requestmade by a client may be a request for one or more content files. Theservers receive a request processed by ADC 110, determine a requestedcontent, and respond to the requests by the clients. The content filescan be content such as media items, files, dynamically generatedresponses, and the like. The number of clients, CPUs, servers, andcontent files shown here is merely exemplary and is not intended to belimiting.

In operation, the clients access the content (e.g., Hypertext MarkupLanguage (HTML) files) by sending requests (e.g., Hypertext TransferProtocol (HTTP) requests specifying the Universal Resource Identifiers(URIs) corresponding to the HTML files) to ADC 110. In response toreceipt of the request, ADC 110 routes the request to one or more of theservers. The servers then permit access to the appropriate contentfile(s).

ADC 110 load balances incoming client side requests to the servers,where the host machine implementing ADC 110 performing the loadbalancing has several cores available. In the example shown in FIG. 1,the ADC host machines has two cores (CPUs). In a typical networkmanagement system, there are 16 servers and 128 cores in the hostmachine. Given a set of resources (here, servers) to be used by anapplication, the typical approach is to access all resources on allcores (here, CPUs). Instead of applying the conventional technique ofload balancing any client request on any core to any server, thetechniques discussed below enable the option to use each server on asubset of cores only, and each core is assigned a fixed number ofservers (e.g., just one server). Conversely, given 512 servers and 128cores, each core is mapped to use four servers only and each server isused by only one core. This provides a flexible approach of mappingresources to a subset of the cores or vice versa.

In various embodiments, ADC 110 includes a distribution engine 102.Distribution engine 102 is configured to perform a distribution processto determine a mapping of resources to cores or cores to resources.Distribution engine 102 is configured to perform the process shown inthe figures below, for example. The distribution engine may beimplemented by a computer system such as the one shown in FIG. 12.

Although the examples below describe mapping of CPUs to servers in aload balancing context, the techniques also find application in othersystems. For example, in a content distribution network, the cores aredata centers and the resources are files. In a data center, serverboxes/appliances are cores and the resources are files. A core issometimes called a “processing unit” in this disclosure. A resourcecontains or generates data, while a core is a processing unit or acts asa gateway to access a resource.

FIG. 2 shows an example mapping of resources to cores. In this example,there are three resources (Resource A, Resource B, and Resource C) andthree cores (Core 1, Core 2, and Core 3). A line between a resource anda core represents the core being able to access the resource. Here, allcores are mapped to all resources, which means that all cores can accessall resources. This is sometimes referred to as all resource beingdistributed to all cores or all cores being distributed to allresources. In conventional network management systems, this is a typicalmapping because decision need not be made about how to distributeresources to cores.

As another example, consider the problem of placing a virtual service(resource) on the cores of a host. In some embodiments, a virtualservice is a front-end abstraction that a load balancer provides, forexample an IP address that an ADC uses to receive client requests. Whena client connects to a virtual service address, the ADC processes theclient connection or request against a list of settings, policies, andprofiles and sends valid client traffic to a back-end server that is amember of the virtual service's pool of servers. In conventionalsystems, the virtual service is typically placed on all cores of thehost, which means that the core is used to run a process that implementsthe virtual service. Using the example shown in FIG. 2, the virtualservices are Resources A, B, C, and they are mapped to Cores 1, 2, 3. Invarious embodiments, using the techniques discussed below, each virtualservice is placed on a subset of the cores of the host machine whileimposing the same workload on each core (e.g., where workload ismeasured by the number of sessions, bandwidth, and the like).

In conventional systems, to prevent overloading of resources, a counteris maintained for each resource. When a resource is used on any core,the counter is incremented. However, depending on the way counters areimplemented, issues of contention and concurrency may occur, which canbe addressed by various tools such as programming language paradigms,CPU pinning, cache invalidation, locks, messaging, etc. The overhead ofmaintaining these counters or using these tools is high (e.g.,incrementing the counter, locking), and the overhead increases as thenumber of cores increases.

The performance of the network system can be improved if the overhead isnot so high. For example, response times to queries can decrease andless memory is used because not so many counters need to be stored orless CPU is used due to reduced contention during updates of counters.For a given task at hand, all resources do not necessarily have to beused on all cores for the system to perform adequately. In other words,a resource does not need to access all cores in order to meet a servicelevel agreement or to provide sufficient performance. The mappingtechniques described below distributes cores to resources or resourcesto cores such that not necessarily all of the resources are assigned toeach core. Instead, a subset of resources may be allocated to a core ora subset of cores may be allocated to a resource. This improves thefunctioning of a network system because the amount of memory andprocessing cycles required for processing can be reduced.

Core to resource mapping and resource to core mapping is disclosed. Coreaffinity, which is the assignment of resources to cores, is determined,and a fixed number of resources are assigned to each core. Conversely, afixed number of cores can be assigned to each resource. For example,when the resources are servers, a core-to-resource(s) mapping isperformed. When the resources are virtual services, the mapping isinverted to obtain a resource-to-core(s) mapping. The resources do nothave to be homogeneous or alike. They can be weighted, e.g., serverscould have different capacities for handling traffic and may be weightedaccordingly. Similarly, virtual services can have weights that denotethe relative amount of traffic they are expected to handle in comparisonto other virtual services. Cores on a host machine are usually alike inperformance, though the techniques described here also find applicationin the context of cores with different weights.

FIGS. 3A and 3B show example mappings that can be determined by themapping process further described below.

FIG. 3A shows an example mapping of resources to cores according to anembodiment of the present disclosure. The number of resources and coresin this example is the same as in FIG. 2. However, unlike FIG. 2, eachresource accesses two out of the three cores. More specifically,Resource A is mapped to Core 1 and Core 2, Resource B is mapped to Core1 and Core 3, and Resource C is mapped to Core 2 and Core 3. As aresult, if counters are maintained, the number of counters can bedecreased and less memory and processing cycles are used to keep trackof the loading of a resource. For example, instead of maintaining threecounters per resource, only two are used. Inversely, if a single counterper resource is used across cores, updating it requires less CPUoverhead due to reduced contention for concurrent updates. Although eachresource in this example does not access all of the cores, using two ofthe cores alone may be sufficient for the resource's needs.

FIG. 3B shows another example mapping of resources to cores according toan embodiment of the present disclosure. The number of resources andcores in this example is the same as in FIG. 2. However, unlike FIG. 2,each resource is allocated a different number of cores. Morespecifically, Resource A is mapped to Core 1, Core 2, and Core 3;Resource B is mapped to Core 1 and Core 2, and Resource C is mapped toonly Core 3. As a result, if counters are maintained, the number ofcounters can be decreased and less memory and processing cycles are usedto keep track of the loading of a resource, or CPU contention is reducedwhen using a single counter per resource across cores. This mapping isdifferent from the example in FIG. 3A because it accounts for thedifferent needs of resources. The reason may be that Resource A has moreneeds than Resource B and Resource C. For example, Resource A tends tosee more traffic than Resource B and Resource C, and therefore needsmore processing power (cores). Thus, Resource A is allocated more coresthan the other resources. Another reason for this allocation may bediffering capacities of cores. For example, the cores may be implementedby different types of hardware and here Core 3 has the greatestprocessing ability. Thus, Resource C is mapped solely to Core 3 becauseCore 3 is adequate for Resource C's needs, and Core 3 is still able todevote some of its remaining processing ability to the other resources.

FIG. 4 is a flow chart illustrating an embodiment of a process to mapresources to cores or cores to resources. This process may beimplemented by a processor such as the one shown in FIG. 12 or adistribution engine such as the one shown in FIG. 1.

At 402, an input pattern including resource identifiers corresponding toresources is obtained. The input pattern includes an ordered list ofidentifiers, which identifiers identify resources that are to be mappedto cores (or the cores may be mapped to the resources as furtherdescribed below). In various embodiments, each resource is representedin the pattern in proportion to its weight, and the resource identifiersare contiguous. Referring to FIG. 6 in which there are three resourcesA, B, and C, an input pattern (P0) can be “ABC.” In some embodiments,the order may be different, e.g., “BAC.”

Returning to FIG. 4, at 404, the input pattern is applied to aguaranteed regular and uniform distribution process to obtain adistribution pattern. That is, the distribution process guarantees thatthe output will be regular and uniform from the perspective of theresources or the cores. The distribution process translates andmanipulates the input pattern to produce a distribution pattern. Asexplained below, the distribution pattern can, alone or with otherfactors, define how resources are distributed to cores or how cores aredistributed to resources. An example of a distribution pattern that mapscores to resources is: “1:{A:4, B:2}, 2:{B:2, C:4}, 3: {A:4, B:2}, 4:{B:2, C:4}.” An example of a distribution that maps resources to coresis: “A:{U:4, W:4}, B:{U:2, V:2, W:2, X:2}, C: {V:4, X:4},” where U, V,W, and X are cores. These examples are explained in greater detailbelow.

In one aspect, the distribution pattern generated by the distributionprocess is guaranteed to be regular and uniform. This means that theresources are evenly divided among the cores or the cores are evenlydivided among the resources. Unlike a round robin distribution process,which will provide an irregular and non-uniform distribution when thenumber of resources is not evenly divisible by the number of cores orvice versa, the distribution process here guarantees a regular anduniform mapping. In other words, regardless of the number of resourcesand the number of cores, a mapping results in equal and even sharing ofresources or cores. In the case of weighted resources, the uniformityand regularity of the mapping is observed when taking the weights intoaccount.

Examples of distribution processes that can be applied to an inputpattern is shown in FIGS. 5, 7, 8, and 10.

At 406, the resources are distributed across the cores or the cores aredistributed across the resources according to the distribution pattern.Whether resources are distributed to cores or cores to resources isresponsive to a user's selection. Referring to FIG. 6, and theresource-to-core distribution pattern (1:{A:4, B:2}, 2:{B:2, C:4},3:{A:4, B:2}, 4:{B:2, C:4}), four portions of Resource A and twoportions of Resource B are distributed to Core 1, two portions ofResource B and four portions of Resource C are distributed to Core 2,four portion of Resource A and two portions of Resource B aredistributed to Core 3, and two portions of Resource B and four portionsof Resource C are distributed to Core 4. Assuming that Resources A, B,and C are equal, the amount of resources distributed to each core isequal.

The input pattern may be contiguous or non-contiguous. An example of acontiguous pattern is: AABBCCCC because identifiers of the same type (Ais one type of identifier, B is another type, and C yet another type)are next to each other. An example of a non-contiguous pattern is:ABACB. In various embodiments, if the input pattern is non-contiguous,then prior to applying the input pattern to the distribution process(404), the resource identifiers in the input pattern are sorted to makethe input pattern contiguous. In one aspect, a contiguous pattern keepsthe mapping small such that the number of cores mapping to resources (orresources mapping to cores) is lower. This reduces the overhead in termsof memory used for counters or CPU contention. As further describedbelow, a non-contiguous pattern can be used to represent placeholderresources/headroom. In some embodiments, the pattern is allowed to benon-contiguous for the headroom identifiers only such that if theheadroom resource identifier is excluded, the rest of the pattern iscontiguous.

In various embodiments, mapping can be performed more than once. Forexample, resources and cores that have been previously mapped can beadded to an input pattern to determine a subsequent mapping of resourcesto cores or cores to resources.

The following figures are examples of distribution processes. FIGS. 5-7are examples of resource to core mapping. FIGS. 8-11 are examples ofcore to resource mapping.

FIG. 5 is a flow chart illustrating an embodiment of a process to mapcores to resources. This process may be implemented by a processor suchas the one shown in FIG. 12 or a distribution engine such as the oneshown in FIG. 1. This process may be performed as part of anotherprocess, for example as part of 404 of FIG. 4.

Consider a host machine with c cores and r resources. The process shownin FIG. 5 is performed to achieve a core-to-resource mapping using apattern that is constructed and/or modified during the process. Supposethe number of cores is c=4 and the number of resources is r=3. Let the rresources be denoted by the symbols A, B, and C. Referring to FIG. 6,the r resources are Resource A, Resource B, and Resource C and the ccores are Core 1, Core 2, Core 3, and Core 4. The initial pattern (inputpattern P0) is an ordered list of resource identifiers:

-   -   P0: ABC

In various embodiments, the order is arbitrary but fixed once selected.At 502, a stretch factor is applied to an input pattern to obtain asecond pattern. In various embodiment, a goal of the “stretch” step isto build the smallest pattern of resources that can be (but are not yet)evenly divided to cores. In various embodiments, the stretch factor “sf”is given by c/GCD(c, r). Here, GCD is a function that computes agreatest common divisor of its arguments. With c=4, r=3, the stretchfactor is 4/1=4. Each resource identifier in the pattern is thenstretched by the stretch factor by replacing each resource identifierwith “sf” identical resource identifiers.

The input pattern P0 is stretched by replacing each resource identifierwith 4 identical resource identifiers to obtain a second pattern P1:

-   -   P1: AAAABBBBCCCC

The length of the obtained pattern (P1) is “r*sf=LCM(c, r)” where LCM isthe lowest common multiple. Thus, the length of this pattern is amultiple of the number of cores. The stretch factor in variousembodiments can be any integral multiple of the smallest value, althoughkeeping sf as small as possible helps keep the associated overhead low,at least in this example. The associated overhead can be the number ofcores a resource maps to, which determines how many counters are used orhow much CPU contention there is for updating a resource counter. Invarious embodiments, a sf that is an integer guarantees a regular anduniform mapping. For the example above, this smallest sf=4 is used. Theinput pattern P0 may be obtained earlier for example at 402 of FIG. 4.

At 504, a repeat factor is applied to the second pattern to obtain athird pattern. In various embodiments, a goal of the “repeat” step is totransform the pattern based on a configurable value called a repeatfactor “rf,” which is a positive integer. Let rf=2 in this example. Thismeans that the pattern is rf copies of the second pattern. The secondpattern P1 is duplicated (rf−1) times to obtain the third pattern P2:

P2: AAAABBBBCCCCAAAABBBBCCCC

The repeat factor is user configurable and can be input at a userinterface and received by the device performing the process of FIG. 5.In various embodiments, a goal of the “repeat factor” configuration isto provide a parameter to control the number of cores that a resourcemaps to. As a repeat factor increases, the number of different resourcesa core uses is likely to increase (but not guaranteed to, due to kerningin the pattern after partitioning as further described below).Similarly, as the repeat factor increases, the number of different coresa resource is mapped to is also likely to increase (again, withallowances for kerning). A user can choose an rf value to controloverhead. Typically, a lower repeat factor corresponds to lower overheadat the expense of being relatively less flexible. A lower repeat factorcorresponds to less distributed or more restricted mapping.

At 506, the third pattern is partitioned to obtain a fourth pattern. Insome embodiments, the number of partitions into which the third patternis partitioned is the number of cores to which the resources are to bemapped. Since the length of pattern P1 is an integral multiple of thenumber of cores, the length of P2 is also a multiple of the number ofcores. The third pattern P2 is evenly partitioned into c (here, c=4)partitions—one for each core—to obtain the fourth pattern P3:

-   -   P3: AAAABB−BBCCCC−AAAABB−BBCCCC

At 508, the fourth pattern is compressed to obtain a distributionpattern. The distribution pattern can be used to distribute cores acrossresources as described in 406 of FIG. 4. Compression translatesduplicate resource identifiers within a particular partition to anumber. Referring to the first partition in pattern P3 (AAAABB), thecompression is (A:4, B:2) because there are four instance of A and twoinstances of B. Compression helps to discover weights for each resourceidentifier assigned to a core. The fourth pattern P3 is compressed toobtain the distribution pattern P4:

-   -   P4: 1:{A:4, B:2}, 2:{B:2, C:4}, 3:{A:4, B:2}, 4:{B:2, C:4}

In various embodiments, even if the resources are not weighted, thefinal mapping is weighted. Effectively, each of the r resources has beensplit into multiple sub-resources. Pattern P4 is read as: Core 1 isallocated to four portions of Resource A and two portions of Resource B.Core 2 is allocated to two portions of Resource B and four portions ofResource C. Core 3 is allocated to four portions of Resource A and twoportions of Resource B. Core 4 is allocated to two portions of ResourceB and four portions of Resource C. In this example, each core isassigned 6 units of sub-resources, 4 of which are provided by oneresource and 2 by another resource.

In this example, some resources are mapped to 2 cores while others aremapped to 4 cores. This effect is called “kerning” because theassignment of resources appears to be non-uniform (some resources aremapped to 2 cores while others are mapped to 4 cores). However, thekerning is simply an artifact because each core has access to the samefraction of total resources. In this example, out of 3 resources, eachof the 4 cores has access to 3/4 of the resources via sub-resources,which is an exact division without resorting to fractions as fractionarithmetic is prone to rounding errors in limited precision hardware.Therefore, the assignment of resources is regular and uniform.

FIG. 6 shows an example mapping of cores to resources resulting from theprocess shown in FIG. 5. The distribution of cores to resources in thisexample is according to the example pattern P4 obtained above, i.e.,1:{A:4, B:2}, 2:{B:2, C:4}, 3:{A:4, B:2}, 4:{B:2, C:4}. As shown, Core 1is allocated to Resource A and Resource B in the ratio 4:2. Core 2 isallocated to Resource B and Resource C in the ratio 2:4. Core 3 isallocated to Resource A and Resource B in the ratio 4:2. Core 4 isallocated to Resource B and Resource C in the ratio 2:4.

In a load balancing example, with this distribution pattern P4, clientrequests arriving at Core 2 are load balanced to servers (resources) Band C in the ratio 2:4 (in other words, for every 2 requests loadbalanced to B, there are 4 requests load balanced to C). Core 2 does notuse server A. Thus, this is an example in which all resources are notused on all cores, but each core is still able to meet performancerequirements. At the same time, each core has access to an equivalentnumber of sub-resources, which means core-to-resource(s) mapping isuniform.

In some cases, all the resources are not alike in the sense that someresources have greater capability than others. FIG. 7 is an exampleprocess of mapping resourcing to cores for such a situation in whichresources have unequal capabilities. For example, a first resource maybe implemented by hardware with greater processing power than a secondresource.

FIG. 7 is a flow chart illustrating an embodiment of a process to mapresources to cores. This process may be implemented by a processor suchas the one shown in FIG. 12 or a distribution engine such as the oneshown in FIG. 1. This process may be performed as part of anotherprocess, for example as part of 404 of FIG. 4.

The system structure is the same as the one described for FIG. 5 (4cores and 3 resources) except that the resources have unequalcapabilities. The unequal capabilities are accounted for in the initialinput pattern by weighting the resource identifiers to account for theirvarying processing abilities. Suppose that Resource B is twice ascapable as Resource A and Resource C is three times as capable asResource A. The input pattern can encode this information about theunequal capabilities of the resources. For example, input pattern P0duplicates each resource identifier the number of times corresponding toits ability. In various embodiments, the resource identifier isduplicated adjacently so that the pattern is contiguous meaning thatresource identifiers of the same type are grouped together, like“ABBCCC.”

In the example discussed above, suppose Resources A, B, C have weights1, 2, and 3 respectively. The input pattern P0 is:

-   -   P0: ABBCCC

The number of resources r is now the sum of the weights of all theresources, here r=6. In some embodiments, all weights may be scaled downby a factor of GCD of all the weights.

At 702, a stretch factor is applied to an input pattern to obtain asecond pattern. The stretch factor is applied in the same manner as in502 of FIG. 5. Here, the stretch factor sf=2 because sf=c/GCD(r, c). r=6and c=4. 4/GCD(6, 4)=2. Applying sf=2 to the input pattern P0 obtainsthe second pattern P1:

-   -   P1: AABBBBCCCCCC

At 704, a repeat factor is applied to the second pattern to obtain athird pattern. The repeat factor is applied in the same manner as in 504of FIG. 5. Here, suppose repeat factor=2 is selected. Then, applyingrf=2 to the second pattern P1 obtains the third pattern P2:

-   -   P2: AABBBBCCCCCCAABBBBCCCCCC

At 706, the third pattern is partitioned to obtain a fourth pattern. Thepartitioning is performed in the same manner as 506 of FIG. 5. The thirdpattern P2 is partitioned into c=4 cores to obtain the fourth patternP3:

-   -   P3: AABBBB−CCCCCC−AABBBB−CCCCCC

At 708, the fourth pattern is compressed to obtain a distributionpattern. The compression is performed in the same manner as 508 of FIG.5. The fourth pattern P3 is compressed to obtain the distributionpattern P4:

-   -   P4: 1:{A:2, B:4}, 2:{C:6}, 3:{A:2, B:4}, 4:{C:6}

The distribution pattern guarantees a regular and uniform distributionin the desired ratio 1:2:3 with respect to Resources A, B, C.Specifically, 4 portions of Resource A are allocated, 8 portions ofResource B are allocated, and 12 portions of C are allocated. ResourceA, B, and C are assigned to 2 cores only instead of all 4 cores.

The distribution process can be applied to a variety of resourcesincluding placeholder resources. A placeholder resource is a resourcethat is reserved for a purpose that is not currently known. Reservingresources can be useful for example to accommodate applications that areas yet unknown. In a network traffic management setting, cores on a hostmachine are typically virtualized, and a machine may host manyapplications including some that the administrator is not yet aware of.The placeholder resources can later be used for those unknownapplications or for background work.

Placeholder resources can be used with regular resources when performinga distribution process to determine a distribution pattern. In variousembodiments, an input pattern is constructed to include one or moreplaceholder resources. If the number of resources were only 2 (say, Aand B), but the core-to-resource(s) mapping is performed with an extraplaceholder resource (say C), a mapping is obtained where there areplaceholders in the pattern corresponding to where C is mapped to. Thiscorresponds to leaving empty spaces on cores, e.g., to supportapplications that are as yet unknown or to reserve space for otherpurposes. The number of such placeholder resources and theirinterleaving with actual resources in the input pattern P0 can be usedto determine mappings that achieve different goals, e.g., leaving asubset of cores with no assigned tasks, or allowing some spare headroom(placeholder) on all cores. For example, more interleaving (e.g., ACBC)corresponds to placeholders on all cores while less interleaving (ABCC)corresponds to leaving an entire core(s) with no assigned tasks.

The preceding examples are examples of distributions of cores toresources. In the context of network traffic management, placing virtualservices on the subsets of cores of a host machine cores is an exampleof a resource-to-core(s) mapping problem, which is an inverse of thecore-to-resource(s) mapping problem. The resource-to-core(s) mappingproblem can be solved by using a pattern building approach like the onedescribed above. However, instead of the “compress” step (e.g., 508,708) described above, a “collect” step is used. The following figuresshow examples of mapping resources to cores.

FIG. 8 is a flow chart illustrating an embodiment of a process to mapresources to cores. This process may be implemented by a processor suchas the one shown in FIG. 12 or a distribution engine such as the oneshown in FIG. 1. This process may be performed as part of anotherprocess, for example as part of 404 of FIG. 4.

Consider a host machine with c cores and r resources. As an example, letc=4 and denote the cores by the symbols U, V, W, and X. Let r=3 anddenote the resources by the symbols A, B, and C. Thus, the input patternP0 is:

-   -   P0: ABC

At 802, a stretch factor is applied to an input pattern to obtain asecond pattern. The stretch factor is applied in the same manner as in502 of FIG. 5. Here, the stretch factor sf=4 because sf=c/GCD(r, c). r=3and c=4. 4/GCD(3, 4)=4. Applying sf=4 to the input pattern P0 obtainsthe second pattern P1:

-   -   P1: AAAABBBBCCCC

At 804, a repeat factor is applied to the second pattern to obtain athird pattern. The repeat factor is applied in the same manner as in 504of FIG. 5. Here, suppose repeat factor=2 is selected. Then, applyingrf=2 to the second pattern P1 obtains the third pattern P2:

-   -   P2: AAAABBBBCCCCAAAABBBBCCCC

At 806, the third pattern is partitioned to obtain a fourth pattern. Thepartitioning is performed in the same manner as 506 of FIG. 5. The thirdpattern P2 is partitioned into c=4 cores to obtain the fourth patternP3:

-   -   P3: AAAABB−BBCCCC−AAAABB−BBCCCC

At 808, the fourth pattern is collected to obtain a distributionpattern. Each resource is mapped to the core partitions it falls in, andthe number of times the resource is present in each core partition iscounted. Collecting the fourth pattern P3 obtains the distributionpattern P4, which is a mapping of resources to cores:

-   -   P4: A:{U:4, W:4}, B:{U:2, V:2, W:2, X:2}, C:{V:4, X:4}

Pattern P4 is read as: Resource A is assigned to Core U and Core W inthe ratio 4:4. Resource B is assigned to Core U, Core V, Core W, andCore X in the ratio 2:2:2:2. Resource C is assigned to Core V and Core Xin the ratio 4:4. Even when the resources are unweighted to begin with,after the performing the process of FIG. 8, the mapping discoversweights. Like the example described above, each core is assigned sixunits of sub-resources across all resources where each resource iscomprised of eight sub-resources. Hence, as a resource-to-core(s)mapping mechanism, each core is uniformly loaded.

The example described with respect to FIG. 7 in which resources areweighted also applies to the process described here. For example, theinput pattern would be formed to reflect the weights of the resourcesand the process of FIG. 8 is performed to obtain a distribution ofresources to cores.

FIG. 9 shows an example mapping of resources to cores resulting from theprocess shown in FIG. 8. The distribution of cores to resource in thisexample is according to the example pattern P4 obtained above, i.e.,A:{U:4, W:4}, B:{U:2, V:2, W:2, X:2}, C:{V:4, X:4}. As shown, Resource Amaps to cores U and W in the ratio 4:4, while Resource B maps to allcores with weight 2, and Resource C maps to cores V and X in the ratio4:4. For simplicity, the ratios have been simplified so that Resource Amaps to Cores U and W in ratio 1:1. Instead, Resource A can be picturedwith further subdivisions to map in the ratio 4:4.

In some embodiments, virtual partitioning of a core into sub-cores atthe hardware, hypervisor, or operating system level is possible, and theresource-to-core(s) mapping can use the concept of sub-cores to enablemapping at a finer granularity. The following figure illustrates onesuch example.

FIG. 10 is a flow chart illustrating an embodiment of a process to mapresources to cores in which cores have sub-cores. This process may beimplemented by a processor such as the one shown in FIG. 12 or adistribution engine such as the one shown in FIG. 1. This process may beperformed as part of another process, for example as part of 404 of FIG.4.

Unlike the example of FIG. 9, which has four cores, the cores in thisexample each has sub-cores. Referring to FIG. 10, each core U, V, W, Xis comprised of a number (sc) of sub-cores. In one aspect, this sub-coreabstraction may be useful when the number of resources is much largerthan the number of cores and the operating system provides constructs toschedule tasks on a core in a weighted manner. Here, there are 3sub-cores per core (sc=3), which is represented as shown. Sub-cores U1,U2, U3 make up Core U; sub-cores V1, V2, V3 make up Core V, sub-coresW1, W2, W3 make up Core W; and sub-cores X1, X2, X3 make up Core X.Thus, the input pattern P0 is:

-   -   P0: ABC

At 1002, a stretch factor is applied to an input pattern to obtain asecond pattern. The stretch factor is applied in the same manner as in802 of FIG. 8. The computation of the stretch factor involve replacingthe total number of cores c with the total number of sub-cores c*sc.Applying sf=4 to the input pattern P0 obtains the second pattern P1:

-   -   P1: AAAABBBBCCCC

At 1004, a repeat factor is applied to the second pattern to obtain athird pattern. The repeat factor is applied in the same manner as in 804of FIG. 8. Here, suppose repeat factor=2 is selected. Then, applyingrf=2 to the second pattern P1 obtains the third pattern P2:

-   -   P2: AAAABBBBCCCCAAAABBBBCCCC

At 1006, the third pattern is partitioned to obtain a fourth pattern.The partitioning is performed in the same manner as 806 of FIG. 8. Thethird pattern P2 is partitioned into c*sc=12 sub-cores to obtain thefourth pattern P3:

-   -   P3: AA−AA−BB−BB−CC−CC−AA−AA−BB−BB−CC−CC

At 1008, the fourth pattern is collected to obtain a distributionpattern. Each resource is mapped to the sub-core partitions it falls in,and the number of times the resource is present in each sub-corepartition is counted. Collecting the fourth pattern P3 obtains thedistribution pattern P4, which is a mapping of resources to sub-cores:

-   -   P4: A:{U1:2, U2:2, W1:2, W2:2},        -   B:{U3:2, V1:2, W3:2, X1:2},        -   C:{V2:2, V3:2, X2:2, X3:2}

Pattern P4 is read as follows. Resource A is allocated to the sub-coresas follows: two portions are assigned to Sub-core U1, two portions areassigned to Sub-core U2, two portions are assigned to Sub-core W1, andtwo portions are assigned to Sub-core W2. Resource B is allocated to thecores as follows: two portions are assigned to Sub-core U3, two portionsare assigned to Sub-core V1, two portions are assigned to Sub-core W3,and two portions are assigned to Sub-core X1. Resource C is allocated asfollows: two portions are assigned to Sub-core V2, two portions areassigned to Sub-core V3, two portions are assigned to Sub-core X2, andtwo portions are assigned to Sub-core X3.

FIG. 11 shows an example mapping of resources to sub-cores resultingfrom the process shown in FIG. 10. The distribution of cores to resourcein this example is according to the example pattern P4 obtained above,i.e., A:{U1:2, U2:2, W1:2, W2:2}, B:{U3:2, V1:2, W3:2, X1:2}, C:{V2:2,V3:2, X2:2, X3:2}. The diagram has been simplified so that eachsub-block of a Resource represents two portions of that resource. Asshown, Resource A maps to Sub-cores U1, U2, W1, and W2 in the ratio2:2:2:2, while Resource B maps to Sub-cores U3, V1, W3, and X1 in theratio 2:2:2:2, and Resource C maps to Sub-cores V2, V3, X2, and X3 inthe ratio 2:2:2:2.

There are many benefits to the mapping techniques described above.Benefits of these techniques include lower overhead in terms of memoryrequirements, CPU overhead, and concurrency related coordination becausethe same performance can be achieved by using only a subset of resourceson a core or a subset of cores for a resource. Moreover, in a largescale distributed system with heterogeneous hosts, mapping can beaddressed locally at the host without requiring centralized coordination(centralized coordination may lead to issues with scalability).

FIG. 12 is a functional diagram illustrating a programmed computersystem for mapping resources to cores of cores to resources inaccordance with some embodiments. As will be apparent, other computersystem architectures and configurations can be used to perform thedescribed mapping technique. Computer system 1200, which includesvarious subsystems as described below, includes at least onemicroprocessor subsystem (also referred to as a processor or a centralprocessing unit (CPU) 1202). For example, processor 1202 can beimplemented by a single-chip processor or by multiple processors. Insome embodiments, processor 1202 is a general purpose digital processorthat controls the operation of the computer system 1200. In someembodiments, processor 1202 also includes one or more coprocessors orspecial purpose processors (e.g., a graphics processor, a networkprocessor, etc.). Using instructions retrieved from memory 1210,processor 1202 controls the reception and manipulation of input datareceived on an input device 1206, and the output and display of data onoutput devices (e.g., display 1218).

Processor 1202 is coupled bi-directionally with memory 1210, which caninclude, for example, one or more random access memories (RAM) and/orone or more read-only memories (ROM). As is well known in the art,memory 1210 can be used as a general storage area, a temporary (e.g.,scratch pad) memory, and/or a cache memory. Memory 1210 can also be usedto store input data and processed data, as well as to store programminginstructions and data, in the form of data objects and text objects, inaddition to other data and instructions for processes operating onprocessor 1202. Also as is well known in the art, memory 1210 typicallyincludes basic operating instructions, program code, data, and objectsused by the processor 1202 to perform its functions (e.g., programmedinstructions). For example, memory 1210 can include any suitablecomputer readable storage media described below, depending on whether,for example, data access needs to be bi-directional or uni-directional.For example, processor 1202 can also directly and very rapidly retrieveand store frequently needed data in a cache memory included in memory1210.

A removable mass storage device 1212 provides additional data storagecapacity for the computer system 1200, and is optionally coupled eitherbi-directionally (read/write) or uni-directionally (read only) toprocessor 1202. A fixed mass storage 1220 can also, for example, provideadditional data storage capacity. For example, storage devices 1212and/or 1220 can include computer readable media such as magnetic tape,flash memory, PC-CARDS, portable mass storage devices such as harddrives (e.g., magnetic, optical, or solid state drives), holographicstorage devices, and other storage devices. Mass storages 1212 and/or1220 generally store additional programming instructions, data, and thelike that typically are not in active use by the processor 1202. It willbe appreciated that the information retained within mass storages 1212and 1220 can be incorporated, if needed, in standard fashion as part ofmemory 1210 (e.g., RAM) as virtual memory.

In addition to providing processor 1202 access to storage subsystems,bus 1214 can be used to provide access to other subsystems and devicesas well. As shown, these can include a display 1218, a network interface1216, an input/output (I/O) device interface 1204, an image processingdevice 1206, as well as other subsystems and devices. For example, imageprocessing device 1206 can include a camera, a scanner, etc.; I/O deviceinterface 1204 can include a device interface for interacting with atouchscreen (e.g., a capacitive touch sensitive screen that supportsgesture interpretation), a microphone, a sound card, a speaker, akeyboard, a pointing device (e.g., a mouse, a stylus, a human finger), aGlobal Positioning System (GPS) receiver, an accelerometer, and/or anyother appropriate device interface for interacting with system 1200.Multiple I/O device interfaces can be used in conjunction with computersystem 1200. The I/O device interface can include general and customizedinterfaces that allow the processor 1202 to send and, more typically,receive data from other devices such as keyboards, pointing devices,microphones, touchscreens, transducer card readers, tape readers, voiceor handwriting recognizers, biometrics readers, cameras, portable massstorage devices, and other computers.

The network interface 1216 allows processor 1202 to be coupled toanother computer, computer network, or telecommunications network usinga network connection as shown. For example, through the networkinterface 1216, the processor 1202 can receive information (e.g., dataobjects or program instructions) from another network, or outputinformation to another network in the course of performingmethod/process steps. Information, often represented as a sequence ofinstructions to be executed on a processor, can be received from andoutputted to another network. An interface card or similar device andappropriate software implemented by (e.g., executed/performed on)processor 1202 can be used to connect the computer system 1200 to anexternal network and transfer data according to standard protocols. Forexample, various process embodiments disclosed herein can be executed onprocessor 1202, or can be performed across a network such as theInternet, intranet networks, or local area networks, in conjunction witha remote processor that shares a portion of the processing. Additionalmass storage devices (not shown) can also be connected to processor 1202through network interface 1216.

In addition, various embodiments disclosed herein further relate tocomputer storage products with a computer readable medium that includesprogram code for performing various computer-implemented operations. Thecomputer readable medium includes any data storage device that can storedata which can thereafter be read by a computer system. Examples ofcomputer readable media include, but are not limited to: magnetic mediasuch as disks and magnetic tape; optical media such as CD-ROM disks;magneto-optical media such as optical disks; and specially configuredhardware devices such as application-specific integrated circuits(ASICs), programmable logic devices (PLDs), and ROM and RAM devices.Examples of program code include both machine code as produced, forexample, by a compiler, or files containing higher level code (e.g.,script) that can be executed using an interpreter.

The computer system shown in FIG. 12 is but an example of a computersystem suitable for use with the various embodiments disclosed herein.Other computer systems suitable for such use can include additional orfewer subsystems. In some computer systems, subsystems can sharecomponents (e.g., for touchscreen-based devices such as smart phones,tablets, etc., I/O device interface 1204 and display 1218 share thetouch sensitive screen component, which both detects user inputs anddisplays outputs to the user). In addition, bus 1214 is illustrative ofany interconnection scheme serving to link the subsystems. Othercomputer architectures having different configurations of subsystems canalso be utilized.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is: 1-20. (canceled)
 21. A method of distributing loadacross different processor cores of a computer that performs loadbalancing to distribute flows from clients to servers, the comprising:obtaining an input pattern including a plurality of identifierscorresponding to a plurality of servers; generating, from the inputpattern, a distribution pattern that identifies a distribution of flowsfrom each core to a subset of servers that does not include all theplurality of servers; and distributing, from the cores, the flows acrossthe servers based on the identified distribution pattern.
 22. The methodof claim 21, wherein the input pattern is a first pattern, and whereingenerating the distribution pattern comprises: obtaining a secondpattern by applying a stretch factor to the input pattern to replicateeach server identifier a plurality of times; obtaining a third patternby applying a repeat factor to the second pattern, said third patternincluding a plurality of instances of the second pattern; partitioningthe third pattern into a plurality of distribution sub-patterns each ofwhich corresponds to a particular core and specifies a distribution offlows across the subset of servers associated with the particular core.23. The method of claim 22 further comprising compressing a fourthpattern resulting from the partitioning of the third pattern to obtainthe distribution pattern.
 24. The method of claim 21, wherein the inputpattern includes only one identifier for each server in the plurality ofservers.
 25. The method of claim 21, wherein the input pattern includesa plurality of instances of an identifier for at least one server in theplurality of servers, and includes different number instances of twoidentifiers for at least two servers in the plurality of servers. 26.The method of claim 21, wherein the distribution pattern indicates adistribution of servers across cores.
 27. The method of claim 21,wherein the distribution pattern indicates a distribution of the coresacross the server.
 28. The method of claim 21, wherein the distributionpattern includes a weight value for each server.
 29. The method of claim21, wherein the distribution pattern assigns to each core only a subsetof servers in order to decrease amount of resources consumed on thecomputer to maintain count of flows distributed to each server.
 30. Themethod of claim 21, wherein the input and distribution patterns accountfor at least one placeholder server for possible future addition of atleast one server.
 31. A non-transitory machine readable medium storing aprogram for distributing load across different processor cores of acomputer that performs load balancing to distribute flows from clientsto servers, the program comprising sets of instructions for: obtainingan input pattern including a plurality of identifiers corresponding to aplurality of servers; generating, from the input pattern, a distributionpattern that identifies a distribution of flows from each core to asubset of servers that does not include all the plurality of servers;and distributing, from the cores, the flows across the servers based onthe identified distribution pattern.
 32. The non-transitory machinereadable medium of claim 31, wherein the input pattern is a firstpattern, and wherein the set of instructions for generating thedistribution pattern comprises the sets of instructions for: obtaining asecond pattern by applying a stretch factor to the input pattern toreplicate each server identifier a plurality of times; obtaining a thirdpattern by applying a repeat factor to the second pattern, said thirdpattern including a plurality of instances of the second pattern;partitioning the third pattern into a plurality of distributionsub-patterns each of which corresponds to a particular core andspecifies a distribution of flows across the subset of serversassociated with the particular core.
 33. The non-transitory machinereadable medium of claim 32, wherein the program further comprises a setof instructions for compressing a fourth pattern resulting from thepartitioning of the third pattern to obtain the distribution pattern.34. The non-transitory machine readable medium of claim 31, wherein theinput pattern includes only one identifier for each server in theplurality of servers.
 35. The met non-transitory machine readable mediumhod of claim 31, wherein the input pattern includes a plurality ofinstances of an identifier for at least one server in the plurality ofservers, and includes different number instances of two identifiers forat least two servers in the plurality of servers.
 36. The non-transitorymachine readable medium of claim 31, wherein the distribution patternindicates a distribution of servers across cores.
 37. The non-transitorymachine readable medium of claim 31, wherein the distribution patternindicates a distribution of the cores across the server.
 38. Thenon-transitory machine readable medium of claim 31, wherein thedistribution pattern includes a weight value for each server.
 39. Thenon-transitory machine readable medium of claim 31, wherein thedistribution pattern assigns to each core only a subset of servers inorder to decrease amount of resources consumed on the computer tomaintain count of flows distributed to each server.
 40. Thenon-transitory machine readable medium of claim 31, wherein the inputand distribution patterns account for at least one placeholder serverfor possible future addition of at least one server.